Lahar: Warehousing Markovian Streams

نویسندگان

  • Julia Maureen Letchner
  • Magdalena Balazinska
  • Gaetano Borriello
  • Dan Suciu
چکیده

Lahar: Warehousing Markovian Streams Julia Maureen Letchner Chair of the Supervisory Committee: Professor Magdalena Balazinska Computer Science and Engineering A huge amount of the world’s data is both sequential and low-level. Many applications consume higher-level information, such as words and sentences, that is inferred from low-level sequences such as raw audio signals using a model (e.g., a hidden Markov model). This inference process is typically statistical, resulting in high-level streams that are imprecise. These imprecise streams, once archived, are useful for analytics support including sequence-finding event queries (e.g. “Find all times when the phrase ‘Barack Obama...veto’ occurs in the NPR news podcast from July 9.”), event query aggregates (e.g. “How many times do 2008 NPR podcasts use the phrase ‘Barack Obama...veto’?”), and event query lineage (e.g. “What words appeared between the word ‘Obama’ and ‘veto’ in the previous query?”). These queries are difficult to support efficiently because archives can be large, and standard relational warehouses cannot support analytics on the rich semantics of imprecise sequences; however, these analytics are critical for allowing applications to effectively leverage this data. In this thesis, we introduce Lahar, the first database system for a common type of imprecise, sequential model called a Markovian stream. Lahar includes novel algorithms for efficiently processing aggregated event queries, and event query lineage. Lahar accelerates performance and scalability of all queries using several techniques, including a set of novel Markovian stream indices and novel methods for approximating Markovian streams. Through experiments on two real-world datasets (one collected from an office-building RFID deployment and the other collected from audio podcasts) we demonstrate that Lahar is an efficient Markovian stream warehousing system.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Lahar Demonstration: Warehousing Markovian Streams

Lahar is a warehousing system for Markovian streams—a common class of uncertain data streams produced via inference on probabilistic models. Example Markovian streams include text inferred from speech, location streams inferred from GPS or RFID readings, and human activity streams inferred from sensor data. Lahar supports OLAP-style queries on Markovian stream archives by leveraging novel appro...

متن کامل

Towards Real-Time Data Stream Processing

Many applications require the continuous tracking of the state of a system in order to detect the occurrence of a particular event. RFID sensors, in particular, have become an increasingly popular means of gathering tracking information about the objects of interest. The need to query these data has spurred research at the intersection of sensor networks and databases. There are a number of cha...

متن کامل

Approximation Trade-Offs in a Markovian Stream Warehouse: An Empirical Study UW TR: #UW-CSE-09-07-03

A large amount of the world’s data is both sequential and low-level. Many applications need to query higher-level information (e.g., words and sentences) that is inferred from these low-level sequences (e.g., raw audio signals) using a model (e.g., a hidden Markov model). This inference process is typically statistical, resulting in high-level sequences that are imprecise. Once archived, these ...

متن کامل

Approximation trade-offs in a Markovian stream warehouse: An empirical study

A large amount of the world’s data is both sequential and low-level. Many applications need to query higher-level information (e.g., words and sentences) that is inferred from these low-level sequences (e.g., raw audio signals) using a model (e.g., a hidden Markov model). This inference process is typically statistical, resulting in high-level sequences that are imprecise. Once archived, these ...

متن کامل

Data Stream Warehousing In Tidalrace

Big data is a ubiquitous feature of large modern enterprises. Many organizations generate huge amounts of on-line streaming data – examples include network monitoring, Twitter feeds, financial data, and industrial application monitoring. Making effective use of these data streams can be challenging. While Data Stream Management Systems can provide support for realtime alerting and data reductio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010